{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据结构和算法\n",
    "\n",
    "### 保留最后 N 个元素\n",
    "\n",
    "`deque` 用于保留有限的历史纪录。它是一个双端队列，在两端插入的时间复杂度均为 `O(1)`。当不指定 `maxlen` 时其长度无限制。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "deque([5, 6, 7, 8, 9])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from collections import deque\n",
    "\n",
    "last_items = deque(maxlen=5)\n",
    "\n",
    "for i in range(10):\n",
    "    last_items.append(i)\n",
    "    \n",
    "last_items"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 查找最大或最小的 K 个元素\n",
    "\n",
    "当需要返回一个列表中最大或最小的 K 个元素时，可以使用函数 `nlargest()` 和 `nsmallest()`。\n",
    "\n",
    "如果 K 的大小和列表长度接近时候，堆就没有太大优势了，此时先排序再切片会更好。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "import heapq\n",
    "nums = [1, 8, 2, 23, 7, -44, 18, 23, 42, 37, 2]\n",
    "\n",
    "heapq.nlargest(3, nums) # [42, 37, 23]\n",
    "heapq.nsmallest(3, nums) # [-4, 1, 2]\n",
    "heapq.nlargest(3, nums, key=lambda x: abs(x)) # [-44, 42, 37]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 查找两字典的相同点\n",
    "\n",
    "`dict` 的 `keys` 和 `items` 方法返回的对象也支持集合操作。比如要完成对键取交集或并集的操作，不需要先转成 `set`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "({'a', 'b', 'c'}, {('a', 1)})"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = {'a': 1, 'b': 2}\n",
    "b = {'a': 1, 'c': 3}\n",
    "a.keys() | b.keys(), a.items() & b.items()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 同时打乱两个列表\n",
    "\n",
    "有的时候两个列表中的元素对应下标是有关系的，因此在打乱的时候，需要把两个列表按照相同的方式打乱。\n",
    "\n",
    "```python\n",
    "name_age_list = list(zip(names, ages))\n",
    "np.random.shuffle(name_age_list)\n",
    "names[:], ages[:] = zip(*name_age_list)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 命名切片\n",
    "\n",
    "使用命名切片可以让代码更易读。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('ID.1234', '10.30')"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "record = 'ID.1234-10.30'\n",
    "\n",
    "ID = slice(0, 7)\n",
    "DATE = slice(8, None)\n",
    "record[ID], record[DATE]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 根据某一键的值来对字典列表排序\n",
    "\n",
    "通常可以使用 `lambda` 表达式来获取某个键的值，不过使用 `itemgetter` 更为方便，对于对象可以使用 `attrgetter` 来获取属性。\n",
    "\n",
    "```python\n",
    "from operator import itemgetter\n",
    "\n",
    "rows = [\n",
    "    {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},\n",
    "    {'fname': 'Brian', 'lname': 'Beazley', 'uid': 1002}\n",
    "]\n",
    "\n",
    "rows.sort(key=itemgetter('fname', 'lname'))\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 使用生成器作为参数\n",
    "\n",
    "有的函数需要参数支持迭代器接口，比如下面的例子，此时，直接传入生成器会更好。而如果转换为列表，就需要创建一个只会使用一次的列表。\n",
    "\n",
    "```python\n",
    "sum(n * n for n in nums)  # good\n",
    "sum([n * n for n in nums]) # bad\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 在多个字典上查找\n",
    "\n",
    "如果需要在多个 `dict` 中查找某个键，当然可以以此在各个 `dict` 中去查找。或者将多个 `dict` 合并成一个。不过使用 `ChainMap` 可以得到类似于变量的作用域链那种结构。而且被链接起来的多个 `dict` 的变化会得到反映。\n",
    "\n",
    "```python\n",
    "from collections import ChainMap\n",
    "a = {'x': 1, 'z': 3}\n",
    "b = {'x': 2, 'y': 2}\n",
    "c = ChainMap(a, b)\n",
    "\n",
    "print(c['x']) # Outputs 1 (from a)\n",
    "print(c['y']) # Outputs 2 (from b)\n",
    "a['y'] = '4'\n",
    "print(c['y']) # Outputs 4 (from a)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---------------\n",
    "\n",
    "## 字符串和文本\n",
    "\n",
    "### 分割字符串\n",
    "\n",
    "切分字符串可以使用 `str.split` 但是它常常不够灵活性，因为如果指定分隔符，则分隔符只能为固定的一个。如果不指定，则默认以空白符分隔。\n",
    "\n",
    "```python\n",
    "text = 'a,b,c,d'\n",
    "text.split(',')\n",
    ">>> ['a', 'b', 'c', 'd']\n",
    "\n",
    "text = 'a b c d'\n",
    "text.split()\n",
    ">>> ['a', 'b', 'c', 'd']\n",
    "```\n",
    "\n",
    "使用 `re.split` 就会灵活很多。\n",
    "\n",
    "```python\n",
    "import re\n",
    "\n",
    "text = 'a+b-c*d'\n",
    "re.split(r'[-+*]', text)\n",
    ">>> ['a', 'b', 'c', 'd']\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 日期和时间\n",
    "\n",
    "### 基本的日期与时间转换\n",
    "\n",
    "定义某个时间点，可以使用 `datatime`，定义时间偏移量，可以使用 `timedelta`。两个时间点相减可得到 `timedelta`，一个时间点加减一个 `timedelta` 可以得到一个新的时间点。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "datetime.datetime(2019, 7, 14, 12, 0)"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from datetime import datetime\n",
    "from datetime import timedelta\n",
    "\n",
    "start = datetime(2019, 7, 2)\n",
    "offset = timedelta(days=12, hours=12)\n",
    "start + offset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 字符串与日期之间的转换\n",
    "\n",
    "使用 `datetime.now` 可以得到当前时刻"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'2019-07-12 19:34:53'"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "datetime.now().strftime('%Y-%m-%d %H:%M:%S')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "datetime.datetime(2019, 7, 12, 19, 34, 53)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text = '2019-07-12 19:34:53'\n",
    "datetime.strptime(text, '%Y-%m-%d %H:%M:%S')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 迭代器与生成器\n",
    "\n",
    "### 实现迭代器接口\n",
    "\n",
    "一个对象为了实现迭代器接口，需要实现 `__iter__` 方法，此方法需要返回一个含有 `__next__` 方法的对象。通常在 `__next__` 方法需要在新的对象上实现，因为 `__next__` 方法需要用到一些状态变量，这需要存在一个对象中。为啥不能在当前对象上实现 `__next__` 呢，因为在对一个对象同时创建多个迭代器的时候，其中用到的状态就会冲突。\n",
    "\n",
    "这里创建了一个 `Stack` 类用来演示迭代器接口。实际中，当然不必如此繁琐。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Stack:\n",
    "    def __init__(self):\n",
    "        self.items = []\n",
    "\n",
    "    def pop(self):\n",
    "        return self.items.pop()\n",
    "\n",
    "    def push(self, item):\n",
    "        self.items.append(item)\n",
    "\n",
    "    def __iter__(self):\n",
    "        print('Stack.__iter__')\n",
    "        return StackIterator(self.items)\n",
    "\n",
    "    def __reversed__(self):\n",
    "        print('Stack.__reversed__')\n",
    "        return StackIterator(self.items, reverse=True)\n",
    "\n",
    "class StackIterator:\n",
    "    def __init__(self, items, reverse=False):\n",
    "        self.items = items\n",
    "        self.sp = len(items) - 1\n",
    "        self.bp = 0\n",
    "        self.reverse = reverse\n",
    "\n",
    "    def __iter__(self):\n",
    "        print('StackIterator.__iter__')\n",
    "        return self\n",
    "        \n",
    "    def __next__(self):\n",
    "        print('StackIterator.__next__')\n",
    "        if self.reverse:\n",
    "            return self.__last()\n",
    "        else:\n",
    "            return self.__next()\n",
    "    \n",
    "    def __next(self):\n",
    "        if self.sp < 0:\n",
    "            raise StopIteration()\n",
    "        else:\n",
    "            self.sp -= 1\n",
    "            return self.items[self.sp + 1]\n",
    "\n",
    "    def __last(self):\n",
    "        if self.bp < len(self.items):\n",
    "            self.bp += 1\n",
    "            return self.items[self.bp - 1]\n",
    "        else:\n",
    "            raise StopIteration()\n",
    "\n",
    "stack = Stack()\n",
    "stack.push(1)\n",
    "stack.push(2)\n",
    "stack.push(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`iter` 函数会调用对象的 `__iter__` 方法以返回迭代器。用迭代器作为参数调用 `next` 方法，会代理到 `__next__` 方法上，每次调用都会返回一个元素，具体迭代策略由具体的类来决定。当迭代完了所有元素时，抛出 `StopIteration` 异常以表示迭代结束。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Stack.__iter__\n",
      "StackIterator.__next__\n",
      "3\n",
      "StackIterator.__next__\n",
      "2\n",
      "StackIterator.__next__\n",
      "1\n",
      "StackIterator.__next__\n"
     ]
    }
   ],
   "source": [
    "it = iter(stack)\n",
    "while True:\n",
    "    try:\n",
    "        print(next(it))\n",
    "    except StopIteration:\n",
    "        break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "实现了迭代器接口的对象，可以使用 `list` 来将所有元素转换列表。`list` 会自动驱动迭代器。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Stack.__iter__\n",
      "StackIterator.__next__\n",
      "StackIterator.__next__\n",
      "StackIterator.__next__\n",
      "StackIterator.__next__\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[3, 2, 1]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(stack)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "要想对一个对象实现反向迭代，可以实现 `__reversed__` 接口，此方法需要返回一个迭代器对象。`reversed` 函数会调用此方法。\n",
    "\n",
    "有时候需要使用下面的写法返回一个反向的列表，此时需要注意的是，`list` 希望参数支持迭代器接口，即含有一个 `__iter__` 方法，并返回一个含有 `__next__` 方法的对象。因此返回的迭代器对象也要有 `__iter__` 方法，在此方法中直接返回自身就可以了。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Stack.__reversed__\n",
      "StackIterator.__iter__\n",
      "StackIterator.__next__\n",
      "StackIterator.__next__\n",
      "StackIterator.__next__\n",
      "StackIterator.__next__\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[1, 2, 3]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(reversed(stack))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "实际上只需要将接口代理到底层数据结构的对应接口上即可。\n",
    "\n",
    "```python\n",
    "def __iter__(self):\n",
    "    return iter(self.items)\n",
    "\n",
    "def __reversed__(self):\n",
    "    return reversed(self.items)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 对迭代器切片\n",
    "\n",
    "对于没有实现 `__getitem__` 和 `__len__` 接口的对象是不能进行常规的切片操作的。迭代器没有办法用下标进行索引，且长度未知，常规的切片无法作用于迭代器上，不过可以使用 `itertools.islice` 来实现对迭代器的切片。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 3, 5, 7]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from itertools import islice\n",
    "\n",
    "c = {1,2,3,4,5,6,7,8,9}\n",
    "\n",
    "[x for x in islice(c, 0, 8, 2)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 展开嵌套序列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 2, 3, 4, 5, 6, 7, 8]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from collections import Iterable\n",
    "\n",
    "def flatten(items):\n",
    "    for x in items:\n",
    "        if isinstance(x, list):\n",
    "            yield from flatten(x)\n",
    "        else:\n",
    "            yield x\n",
    "\n",
    "items = [1, 2, [3, 4, [5, [6]], 7], 8]\n",
    "\n",
    "list(flatten(items))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`iter` 有一个特性，它接受一个函数作为参数，`iter` 会不断调用此函数直到输出结果等于第二个参数为止。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 2, 3, 4]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "n = 0\n",
    "def plus():\n",
    "    global n\n",
    "    n += 1\n",
    "    return n\n",
    "\n",
    "list(iter(plus, 5))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Python 语法细节\n",
    "\n",
    "### `classmethod` 与 `staticmethod`\n",
    "\n",
    "```python\n",
    "class Demo:\n",
    "    @classmethod\n",
    "    def klass_method(*args):\n",
    "        return args\n",
    "    @staticmethod\n",
    "    def static_method(*args):\n",
    "        return args\n",
    "```\n",
    "    \n",
    "`classmethod` 修饰的函数，其第一个参数为类本身而不是类的实例。\n",
    "\n",
    "```python\n",
    ">>> Demo.klass_method(1, 2, 3)\n",
    "<<< (__main__.Demo, 1, 2, 3)\n",
    "```\n",
    "\n",
    "通常 `classmethod` 需要在函数中来创建类，且第一个参数习惯上命名为 `cls`。\n",
    "\n",
    "```python\n",
    "class Demo:\n",
    "    def __init__(self, *args):\n",
    "        pass\n",
    "    @classmethod\n",
    "    def klass_method(cls, *args):\n",
    "        return cls(*args)\n",
    "```\n",
    "    \n",
    "staticmethod 修饰的函数，就是普通的函数，只是挂着类的名下罢了。`staticmethod` 修饰的函数，可以看做以类为命名空间的一系列函数，比如 `math` 下面的函数。\n",
    "\n",
    "```python\n",
    ">>> Demo.static_method(1, 2, 3)\n",
    "<<< (1, 2, 3)\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `global`\n",
    "\n",
    "`global` 用在函数内部，用来指出某个变量在 global 命名空间下，即为全局变量。在 Python 中，读取一个变量时，会先在局部作用域下查找，如果找不到再向外层作用域逐层查找。在一个变量赋值的时候，如果在局部作用域下找不到该变量，就会在局部作用域下创建此变量。\n",
    "\n",
    "如果在函数内部，希望对全局变量赋值，就需要明确指出该变量为全局变量，否则赋值的时候会创建新的局部变量。\n",
    "\n",
    "```python\n",
    "name = 'python'\n",
    "\n",
    "def change_name():\n",
    "    name = name.title()\n",
    "    \n",
    "change_name()\n",
    "```\n",
    "\n",
    "这段代码会出错，错误信息为 `UnboundLocalError: local variable 'name' referenced before assignment`，原因是函数 `change_name` 中，有对 `name` 的赋值操作，这会让 `name` 被视为局部变量，对未赋值的局部变量进行读取，就出了错误。\n",
    "\n",
    "\n",
    "```python\n",
    "def change_name():\n",
    "    global name\n",
    "    name = name.title()\n",
    "```\n",
    "\n",
    "使用关键字 `global` 来指明 `name` 为全局变量，就不会出错了。\n",
    "\n",
    "### `nonlocal`\n",
    "\n",
    "关键字 `nonlocal` 用在定义在函数体中的函数内。函数体内部定义的函数，可以使用外层作用域的变量。但依然存在 `global` 关键词要解决的那个问题。在对外部作用域的变量赋值的时候，此变量会被视为局部变量。\n",
    "\n",
    "```python\n",
    "def make_plus():\n",
    "    total = 0\n",
    "    \n",
    "    def plus(n):\n",
    "        nonlocal total\n",
    "        total += n\n",
    "        return total\n",
    "    return plus\n",
    "\n",
    "plus = make_plus()\n",
    "plus(1)\n",
    "```\n",
    "\n",
    "上面这个函数，如果没有 `nonlocal total` 这一行。那么执行到 `total += n` 时报错，因为这一行等价于 `total = total + n`，而 `total` 是局部变量，在没有赋值的情况下读取 `total` 会报错。`nonlocal` 关键词用来说明变量不是局部变量，需要在外层作用域去找。\n",
    "\n",
    "之所以需要 `global` 和 `nonlocal` 这两个关键词，是因为 Python 中声明变量和对变量赋值代码层面是一样的。而在其他语言中，声明变量会使用一些关键词，比如下面这样的：\n",
    "\n",
    "\n",
    "```js\n",
    "let name = '11'\n",
    "var n = 1\n",
    "int m = 2\n",
    "```\n",
    "\n",
    "如此，就可以区分开声明变量和对现有变量赋值，这两种操作了。Python 抛掉了这些，因此需要增加两个关键字来挽救。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 文件操作\n",
    "\n",
    "对文件每一行进行迭代，可以直接在文件上进行迭代，因为文件迭代器对文件内容按行迭代。\n",
    "\n",
    "```python\n",
    "with open('./list.txt', encoding='utf-8', mode='r') as fin:\n",
    "    for line in fin:\n",
    "        print(line) # 行尾含 `\\n`\n",
    "```\n",
    "\n",
    "readline 每次读取一行，如果没有内容了，会返回空行\n",
    "\n",
    "```python\n",
    "with open('./list.txt', encoding='utf-8', mode='r') as fin:\n",
    "    line = fin.readline()\n",
    "    while line:\n",
    "        print(line)\n",
    "        line = fin.readline()  # 行尾含 `\\n`\n",
    "    print(len(line))\n",
    "```\n",
    "        \n",
    "`read` 可以带有参数，参数指定读取的字符数量\n",
    "\n",
    "```python\n",
    "with open('./list.txt', encoding='utf-8', mode='r') as fin:\n",
    "    char = fin.read(1)\n",
    "    while char:\n",
    "        print(char)\n",
    "        char = fin.read(1)\n",
    "```\n",
    "\n",
    "\n",
    "如果不带参数，会一次性读取所有内容\n",
    "\n",
    "```python\n",
    "with open('./list.txt', encoding='utf-8', mode='r') as fin:\n",
    "    content = fin.read()\n",
    "    print(content)\n",
    "```\n",
    "\n",
    "`readlines` 返回一个由每一行构成的数组\n",
    "\n",
    "```python\n",
    "with open('./list.txt', encoding='utf-8', mode='r') as fin:\n",
    "    lines = fin.readlines()\n",
    "    print(lines)\n",
    "```\n",
    "\n",
    "\n",
    "**文件中某些行有编码错误**\n",
    "\n",
    "我在使用 pandas 读取 csv 文件时，pandas 报了如下错误：\n",
    "\n",
    "```\n",
    "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 3: invalid continuation byte\n",
    "```\n",
    "\n",
    "其原因是文件中某几行有编码问题，至于为什么，不得而知。解决方法是移除掉出错的行。\n",
    "\n",
    "```python\n",
    "def remove_bad_lines(in_file, out_file):\n",
    "    fin = open(in_file, mode='r', encoding='utf-8', errors='ignore')\n",
    "    fout = open(out_file, mode='w', encoding='utf-8')\n",
    "    for line in fin:\n",
    "        fout.write(line)\n",
    "    fin.close()\n",
    "    fout.close()\n",
    "```\n",
    "\n",
    "`errors='ignore'` 会忽略掉所有存在编码错误的行。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 多线程/多进程\n",
    "\n",
    "### ProcessPoolExecutor\n",
    "\n",
    "\n",
    "```python\n",
    "from concurrent.futures import ProcessPoolExecutor\n",
    "\n",
    "def run(items, worker, n_works=12):\n",
    "    groups = []\n",
    "    executor = ProcessPoolExecutor(max_workers=n_works)\n",
    "    group_size = int(len(items) / n_works + 1)\n",
    "    \n",
    "    for i in range(n_works):\n",
    "        start = i * group_size\n",
    "        end = start + group_size\n",
    "        groups.append(items[start: end])\n",
    "        \n",
    "    results_list = executor.map(worker, groups)\n",
    "    results_list = list(results_list)\n",
    "    \n",
    "    all_results = []\n",
    "    for results in results_list:\n",
    "        all_results.extend(results)\n",
    "\n",
    "    return all_results\n",
    "\n",
    "labels = run(samples, clf.predict)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
